-
Notifications
You must be signed in to change notification settings - Fork 813
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add doc for "Controlling Disruption", document SafeToEvict
#2924
Conversation
Build Failed 😱 Build Id: 8bd3d516-e352-4373-9674-2615ee3a71d3 To get permission to view the Cloud Build view, join the agones-discuss Google Group. |
b2cdca8
to
266f35a
Compare
Build Failed 😱 Build Id: 694bc201-3bcc-4c87-8b65-13be3c526568 To get permission to view the Cloud Build view, join the agones-discuss Google Group. |
266f35a
to
a811225
Compare
Build Succeeded 👏 Build Id: 1980b9b5-c016-4e68-89c5-7371492cf5e6 The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version:
|
a811225
to
e6647a2
Compare
Build Succeeded 👏 Build Id: f534a55a-ffdc-4839-9b01-57f3380c65b1 The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version:
|
@@ -97,6 +97,12 @@ This affects the Cluster autoscaler, Allocation Scheduling, Pod Scheduling and F | |||
|
|||
#### Cluster Autoscaler | |||
|
|||
{{< alert title="SafeToEvict Feature Gate" color="info" >}} | |||
The [Alpha]({{< ref "/docs/Guides/feature-stages.md#alpha" >}}) `SafeToEvict` feature allows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how to best phrase this, but we want to convey that it provides a better way to control disruption than the annotation (which is only for the autoscaler) as it allows the game to establish a contract about it's behavior.
Maybe you cover this more in the longer doc - reviewing this small change first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to phrase it a little better. WDYT about actually just providing an eviction
sample here? i.e. split the samples up with
Using the eviction
API
The hard way
or something?
|
||
* **Ten minutes:** Cluster Autoscaler respects [ten minutes of graceful termination](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#does-ca-respect-gracefultermination-in-scale-down) on scale-down. On some cloud products, you can configure `--max-graceful-termination-sec` to change this, but it is not advised: Cluster Autoscaler is currently only capable of scaling down one node at a time, and larger graceful termination windows slow this down farther (see [autoscaler#5079](https://github.com/kubernetes/autoscaler/issues/5079)). If the ten minute limit does not apply to you, generally you should choose between `safe: Always` (for sessions less than an hour), or see [below](#considerations-for-long-sessions). | ||
|
||
* **One hour:** On many cloud products, `PodDisruptionBudget` can only block node upgrade evictions for a certain period of time - on GKE this is 1h. After that, the PDB is ignored, or the node upgrade fails with an error. Controlling `Pod` disruption for longer than one hour requires cluster configuration changes outside of Agones - see [below](#considerations-for-long-sessions). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we know what the PBD timeout is on other cloud providers?
On GKE 1h is also the max time given for graceful termination before a pod is forcefully removed during node drain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I can tell gleaning other providers documentations, I think they err on the side of erroring rather than proceeding. I am still trying to research this.
This adds an advanced doc for "Controlling Disruption", documenting the `SafeToEvict` Alpha.
e6647a2
to
6c078ce
Compare
Build Succeeded 👏 Build Id: f68a50d2-7eea-4438-bb21-207afb2402d0 The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version:
|
PTAL - restructured this a bit. |
Build Failed 😱 Build Id: 3f5e9d61-f4a8-4261-a3cd-7afd20b1555d To get permission to view the Cloud Build view, join the agones-discuss Google Group. |
Build Succeeded 👏 Build Id: 24261da8-0f7e-4b33-9e39-8a2e528b443e The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version:
|
PTAL, updated to include a flowchart diagram and reworked some language. |
Build Failed 😱 Build Id: a82b94ce-524f-4447-ba25-798291806986 To get permission to view the Cloud Build view, join the agones-discuss Google Group. |
Build Succeeded 👏 Build Id: 46b58fcf-26b6-4831-9658-421bd9bb0bc1 The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version:
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: roberthbailey, zmerlynn The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
New changes are detected. LGTM label has been removed. |
Build Succeeded 👏 Build Id: fc515088-df4f-4ae3-9422-d52cf18958e4 The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version:
|
This adds an "Advanced" doc for "Controlling Disruption", documenting the
SafeToEvict
Alpha.